Parallel Classification for Data Mining on Shared-Memory Multiprocessors

نویسندگان

  • Mohammed J. Zaki
  • C. T. Howard Ho
  • Rakesh Agrawal
چکیده

We present parallel algorithms for building decision-tree classifiers on shared-memory multiprocessor (SMP) systems. The proposed algorithms span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This basic scheme is extended with task pipelining and dynamic load balancing to yield faster implementations. The task parallel approach uses dynamic subtree partitioning among processors. Our performance evaluation shows that the construction of a decision-tree classifier can be effectively parallelized on an SMP machine with good speedup.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Data Mining for Rules

Data Mining is the process of automatic extraction of novel, useful, and understandable patterns in very large databases. High-performance scalable and parallel computing is crucial for ensuring system scalability and interactivity as datasets grow inexorably in size and complexity. This thesis deals with both the algorithmic and systems aspects of scalable and parallel data mining algorithms a...

متن کامل

Decision Tree Construction for Data Mining on Cluster of Shared-Memory Multiprocessors

Classification of very large datasets is a challenging problem in data mining. It is desirable to have decision-tree classifiers that can handle large datasets, because a large dataset often increases the accuracy of the resulting classification model. Classification tree algorithms can benefit from parallelization because of large memory and computation requirements for handling large datasets...

متن کامل

Towards a Cost-Effective Parallel Data Mining Approach

Massive rule induction has recently emerged as one of the powerful data mining techniques. The problem is known to be exponential in the size of the attributes, and given its ever increasing use, can greatly benefit from parallelization. In this paper, we study cost-effective approaches to parallelize rule generation algorithms. In particular, we consider the propositional rule generation algor...

متن کامل

Decision Trees on Parallel Processors

A framework for induction of decision trees suitable for implementation on shared-and distributed-memory multiprocessors or networks of workstations is described. The approach , called Parallel Decision Trees (PDT), overcomes limitations of equivalent serial algorithms that have been reported by several researchers, and enables the use of the very-large-scale training sets that are increasingly...

متن کامل

Parallel K-Means Algorithm for Shared Memory Multiprocessors

Clustering is the task of assigning a set of instances into groups in such a way that is dissimilarity of instances within each group is minimized. Clustering is widely used in several areas such as data mining, pattern recognition, machine learning, image processing, computer vision and etc. K-means is a popular clustering algorithm which partitions instances into a fixed number clusters in an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999